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Abstract 

By convention, and even more often, as an unintentional consequence 
of design, time distributions of latency and infectious durations in stochas- 
tic epidemic simulations are often exponential. The skewed distribtion 
typically leads to unrealistically short times. We examine the effects of 
altering the distribution latency and infectious times by comparing the 
key results after simulation with exponential and gamma distributions in 
a homogeneous mixing model aswell as a model with regional divisions 
connected by a travel intensity matrix. We show a delay in spread with 
more realistic latency times and offer an explanation of the effect. 
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1 Introduction 



Exponential distributions for latency times and infectiousness times often 
appear with models of infectious diseases, simulated or solved analyti- 
cally. The distribution does not resemble observed distribution of latency 
or infectious times. Depending on the problem at hand, this may be a 
reasonable simplification. For certain questions where the speed of spread 
of the infection is of less importance, this assumption may give perfectly 
satisfactory results. Recently research interest, however, has been directed 
in the initial highly random phase of the epidemic, whereas the final size 
of the epidemic is perhaps of less interest [HE]. In spite of this the ex- 
ponential time assumption has become off-the-wall and many authors, by 
tradition, disregard the consequence of their assumption. 

The reason for the wide spread use is that the exponential distribution 
is inherently "memoryless" [3] which means that future predictions of the 
state of the epidemic in terms of number of latent and infectious individu- 
als etc is based solely on the current state and not on the history of states. 
The probability that 10 people will have fallen ill on Friday depends only 
on how many are ill on Thursday. The state on Wednesday or Tuesday is 
irrelevant. This makes possible a simple stochastic simulation by utilizing 
a Markov process. 

Exponential distributions will appear as a consequence of the assump- 
tion that the rate at which individuals leave a certain state at a certain 
time only depdends on how many individuals is in that state at this time. 
This corresponds to a constant hazard for any individual to leave the state 
is the same as the "memoryless" property. Many authors therefor include 
the exponential distribution assumption more of less unintentionally while 
design the model. 

In this paper we show that the time distribution is vital for accurate 
results and also that, without abandoning a Markov approach, we are 
given some freedom to adapt the distribution to fit real data, using a 
gamma distribution. 

Much work has been done to show the effect of traveling and migration 
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on the evolution of epidemics [H El El EJ . For today's global outbreaks, 
notably the SARS outbreak of 2001, the need to incorporate what informa- 
tion we have on travel networks in our simulations has become increasingly 
apparent. Models that take the Markov approach seem well suited for this 
purpose which was demonstrated by Hufnagel et al. The population is di- 
vided into a number of local regions which can be countries, municipalities 
or other geographic or even social groupings. They are interconnected by 
an infectiousness intensity matrix describing how infection is tranferred 
between regions. This matrix can be estimated from, for example, travel 
data. 

Hufnagel et al. used the catchment area around each international 
airport and within these used a SEIR-model, where every individual can be 
in one of the states susceptible, latent, infectious and recovered. These local 
processes were linked together by the network of international avaiation 
enabling the disease to be transmitted along flight paths. 

Camitz and Liljeros [9] constructed a similar model of a SARS-like 
outbreak in Sweden. In this model the municipal borders were used to 
partition the country. Using detailed travel data between municipalities, 
a travel intensity matrix was estimated and the geograpic spread could be 
studied aswell as the effect of travel restrictions. 

In more detail, the SLIR-model works as follows. The population in 
each municipality is assigned to one of four states, decribing their dis- 
ease state: Susceptible, Latent, Infectious and Recovered. A susceptible 
may become latent with a probability which depends on the number of 
infectious, in his/her own aswell as connected municipalities, depending 
on the intensity of travel between connected municipalities. After being 
infected, the latent individual moves through stages L and R in times cor- 
responding to known latency and infectious times. The actual time for an 
individual will vary randomly about the mean time, which is fixed. The 
crucial point is how these times vary. In [5] [9] the times are picked from 
an exponential distribution. 
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S -> L -> / -> i? 

Indeed we are certainly not the first to introduce the Gamma distri- 
bution in these contexts. Gamma distributions have for a long time been 
stadard in modeling progress of chronic diseases (e.g. cancer) through 
different stages. They have also been used in models of epidemics, see |10| 
for a recent example. But discussion about using this and other distribu- 
tions is lacking in research today. Times with single point distributions 
are sometimes considered a reasonable approximation [111 I12| but for fol- 
lowing the complete dynamics we feel that a variance is necessary. Other 
time distributions have also been used, such as uniform, Log-normal or 
Weibull, the latter two notably differing from Gamma primarily in their 
tails. Such distriutions may be appropriate but wull not be possible to 
model with the Markov approach. 

1.1 The exponential and gamma distribution 

The main drawback with exponential times is a questionable tie to reality. 
The exponential distribution is highly skewed, with high densities for short 
times and a long tail. Empirical latency times and infectious times are 
not exponentially distributed, but rather have a symmetric density about 
their expectation values. Furthurmore, the exponential distribution has 
a quite high variance, equal to its expectation value squared, whereas 
empirical times tend to deviate little from the mean. The dark blue curve 
in [l] shows a plot of the probability density function of the exponential 
distribution as a special case of the gamma distribution, the circumstances 
for this relationship to be explained later. The exponential distribution 
has a single parameter equal to the inverse of the expectation value. 

In practical simulations this will shorten the time of interest. Since the 
median is lower than the expectation value implying that most times will 
be shorter than the expected. For example, say the expectation value of 
the latency time is set to 5 days. With an exponential distribution, 63% 
of the random times will be shorter than 5 days. 18% will be shorter than 
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1 day. Such short times are clearly unrealistic and furhtermore, latency 
times are expected to fall symmetrically about the mean. 

The additional disadvantage in stochastic epidemic simulations is that 
the outcome is highly dependent on the initial stages. Individuals with 
short latency times will predominantly make up the initially infected and 
will inevitably speed up the outbreak. That is to say that the skewness of 
the exponential distribution is dominant in the early stages of the simula- 
tion whereas the expectation value is not apparent until the stochasticity 
has averaged out. 

A few authors have proposed that the gamma distribution be used 
instead[ref]. The gamma distribution, denoted T(k, 6) has two parame- 
ters, a shape parameter k and a scale parameter 9. For integer k:s the 
probability density function takes on a particularly simple form: 



f(t;n,6) = * [)T « 

The mean is kB and variance k9 2 . For k = 1 the gamma distribution is 
in fact identical to the exponential distribution. Keeping the expectation 
value constant, with larger ns, the gamma distribution becomes increas- 
ingly symmetric about its mean and start to resemble times distributions 
we have learnt to expect. The skewness of the density function is infact 

2/yfR. 

The gamma distribution can actually be realized with an uncompli- 
cated extension of a Markov model such as the one in [9]. The sum of 
several exponentially distributed times is in fact gamma-distributed. This 
is expressed as follows. Let X\ . . . X n be independant stochastic variables 
from an exponential distribution Exp(£). Then, Y — X^Lj Xi belongs to 

r(n,c). 

In practice, instead of having only a single latency stage and a single 
infectious stage, we add stages, forcing each individual to go through sev- 
eral stages of latency before becoming infectious, and in the same manner, 
several stages of infectiousness before recovering. Thereby we achieve an 
arbitrarily symmetic time distribution with a minimal alteration to our 
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SLIR-model. In doing so we alter k which is as shown is equivalent to 
the number of stages. These stages have no epidemiological meaning but 
serve only to change the appearance of the time-distribution. 

At the same time we have to decrease £ so as to keep constant the 
expected time which is n£ in the gamma distribution. We can select any 
k we like to produce a good enough fit to an empirical distribution, or 
at the very least, the mean and variance. The cost of added stages is 
of course memory requirements but happily the simulation time is not 
influenced to a degree to be a deferent in any way. This is due to one 
of the key advantages of the Markov approach. We do not have to keep 
track of any individuals in the model. We simply record their number in 
eash state. 

Using a modified version of [9 J we show that ignoring the shape of the 
time distribution devalues the results, comparing the results for different 
k for both latency times and infectious times. The difference in absolute 
terms is significant. 

2 Data and Methods 

We carried out two sets of four simulations, each consisting of 1000 realiza- 
tions of an outbreak inititated with one infected individual in Stockholm. 
In the first set we confined the population of Stockholm allowing us to test 
the change employing the gamma distribution in a single locality random- 
mixing situation not complicated by travel. There is no specific reason 
for using Stochkolm either as the origin of infection or as a comfinement. 
The mixing model is the same in all municipalities. 

In the second set, we used the full travel network for a full scale sim- 
ulation. In each set we ran a reference simulation with both the latency 
and infectious times distributed according to an exponential distribution 
with the mean 5 days. Except for different parameters, this setup corre- 
sponds exactly to the one used in [camitz]. The other three had gamma- 
distributed latency times, infectious times or both. In the case of gamma 
distributions, k = 3 was used. £ was adjusted to attain an expected time 
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of, again, 5 days. 

The inter-municipal infectiousness matrix is the same as in Camitz 
& Liljeros [9]. It is based on an interview survey conducted in Sweden 
between 1999 and 2001 containing some 35000 journeys. This resulted in 
approximately 12000 matrix elements 7y each estimated with 



where Mij is the number of journeys per day from municipality i to j and 
7 is a global scalar [5]. 

The disease is a Active moderatly infectious disease with an Ro of 2.5, 
within every homogenous subpopulation. 

To describe the state of the epidemic we introduce the vector S to keep 
track of the number of susceptibles in each municipality. Additionally, two 
sets of vectors Li . . . L K and Ii . . . Ia are defined to keep track of latents 
and infectious. The indexes k and A are the chosen first paramenters for 
the gamma distribution for latency and infectious times respectively. We 
will use a general formalism for the time being but later we set the param- 
eters to either 1 or 3. In the first case, corresponding to an exponential 
distribution, there will only be one vector in the set. If k is greater than 
unity, then this will be the number of stages of latency or infectioussness 
that each individual needs to pass through. The sizes of each vector is of 
course equal to the number of municipalities. Let P be this number. The 
dimensionality of the entire state space is equal to D = P ■ (1 + k + X). 
The vectors are indexed as Ik,i (italisized when indexed with i) where i 
is the municipality and k is the stage of disease. For any purposes they 
can be treated as tensors or matrices. Summing over all fes and is yields 
in this case the total number of infected. Note that recording the number 
of recovered individuals is redundant since it is simply the sum of the 
number in the three states of infectiousness already covered, subtracted 
from the population. 

At the start of the run the element Si of S is equal to the population 
sizes Ni of each municipality. This is the initial state in each run. For 
each municipality we now have 1 + k + A possible state transitions, each 
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involving incrementing an element corresponding to the municipality in 
one vector and decrementing the "preceding". This is true for all tran- 
sitions except from the last stage of infectiousness which of course only 
involves a decrement. 

We are now ready to set up the equations that will define the transition 
matrix of our Markov process. The quantities Q? k below, is for each 
municipality i the intensity of individuals passing on to the next stage of 
illness and are connected to the probabilities of the corresponding state 
transitions. X 6 {Ci . . . C K ,Ti . . .T\,1Z} is a label signifying transitions 
to one of the latency states, one of the infectious states or the recovered 
state. It is written in a calligraphic font to avoid confusion with Lk, Ik 
and R which are vectors. 



Qf 2 


— VKLl ; i 
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= vkL k -\^ 


Qf 1 


— VK,L K ,i 




= p\h,i 




= PXI\-i,i 


Q n 





(3) 



Finally, people are infected (become latent) with the intensity that de- 
pends number of infected in all the municipalities and the travelintensity 
between each of them: 



Qf 1 = 



k NX 

'^hi + Y^ Tij Ik i 
k=l 



2 = 1 

2^< 



Ni 



(4) 



In the equations above, a is the the expected number of secondary 
infected per infectious, v is the inverse latency period and (3 the recovery 
rate. The second row reads: The number of people per unit time leaving 
the first latency stage is the number of people in that stage times the 
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number of stages times the scalar rate v. The last row is similar, as is 
the first term of the first row but summed over all infectious stages and 
also includes a factor to account for a decreasing number of susceptibles. 
The second term is the contribution from other municipalities via the 
infectiousness network. It includes a sum of infectious individuals over all 
stages and all municipalities but the current. 

Each of these intensitities is the parameter required to specify the 
exponential distribution that yields the timesteps for the corresponding 
transition. The model is now in all respects in place. To simulate we 
would like to take each transition in order and so we are interested to 
know the time At untill the next transition, given the current state. The 
time, one can easily show, is incidentally also exponentially distributed 
with parameter Q equal to the sum of the D intensities in Eqs. [3]and[4l 



At e Exp(Q), (5) 

M 

Q = J2(Qf" + ■ ■ ■ + Of K + Qf fc + ■ ■ ■ + Qf A + Qf)- (6) 

8=1 

To determine which transition occurs at this time we compare the in- 
tensities among themselves. The probability of a transition is proportional 
to the relative value of the corresponding intensity, simply the intensity 
normalized by Q. So in each pass through the main loop of the algorithm 
we find Q, pick a random time step from the exponential distribution 
specified by Q as a parameter, randomly pick a transition according to 
the relative value of the intensities, update the state vectors and the inten- 
sities according to the new state and start again. The simulation proceeds 
this way until an arbitrarily chosen time limit is reached or until there are 
no more infectious or latent, which ever comes first. In our case we chose 
60 days as by this time a substantial majority of simulated scenarios will 
have developed into epidemics. Recall that the object of interest in not 
the final size but any delay in time of the epidemics. 
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3 Results 



The prevalence, along with some additional results, from the first set of 
four simulations confined to Stockholm is presented in table [U It clear 
that the shape of the time-distribution determines the outcome of the 
simulated epidemic. What is more, the prevalence after 60 days follows 
the anticipated pattern, decreasing with more realistic latency times and 
increasing with more realistic infectious times. 

The results of the second simulation set is presented in figure [3] as 
a geographic plot over Sweden with each municipality represented by a 
colored dot. The prevalence is represented by color on a logarithmic scale. 
Again, the incidence and geographic spread is highly dependent on the 
shape of the time-distribution, see also [III As with the first set, the 
order of severeties after 60 days is the anticipated but th effects are even 
more apparent. Retransmission from connected municipalities amplifies 
the distribution effects. 

Remember that there is a time limit of 60 days and that different 
latency time distributions do not necessarily affect the height of the in- 
cidence peak, only when it occurs. We also added a figure for additional 
support with k simultaneously varied from 1 to 20. 

4 Discussion 

Considering first the simpler case of a single municipality, the extremely 
short latency times generated by the exponential distribution was ex- 
pected to accelerate the epidemic. More individuals become infectious 
early in the simulation, in turn infecting others earlier. It can be shown, 
however, that with shorter mean latency time the final size of the epi- 
demic is unchanged. With a skewed infectious time distribution the effect 
is reversed. The epidemic will be delayed, at least initially, due to the 
abundance of very short infectious times. Each infected will infect a fewer 
number of secondary infecteds before recovering. It is harder for the epi- 
demic to catch on and the probability of the disease dying out completely 
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Figure 1: Probability distribution of the Gamma distribution for varying k, all with 
expectation value 5. The special case of k = 1 produces an exponential distribution. 

People enter tunnel at a constant rate, 1000 per day 
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Figure 2: A simulation of people entering a cave at a rate 1000 per day at speeds 
selected from, in the case of the blue curve, an exponential distribution and in the 
case of the red curve, a gamma distribution with k = 3. The number of people 
simultaneously in the cave is plotted. The expected passage time for both curve is 
5 days which gives the same number of people in the cave after a transitional phase. 
The transitional phase differs, however. 
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Figure 3: Image visualizing the epidemic state after 60 days simulation, averaged 
over 1000 runs. The form parameter for latency times increase from the left to right 
column and for infectious times from the bottom to top row. The prevelence in each 
municpality is color coded on a logarithmic scale. Clearly a more realistic latency time 
distribution delays the epidemic significantly. 
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A 


K 

1 3 


Cumulative incidence 3 

1 


10099 4174 
3816 1968 


Prevalence 3 

1 


2751 1021 
887 419 


Mean time for extinction (days) 3 

1 


5.4 5.6 
5.3 4.5 


Number of extinction runs 3 

1 


230 217 
359 390 



Table I: Results for epidemic confined to Stockholm, essentially a homogeneous dis- 
persion model. The figures follow the predicted behaviour. Note the differences in 
monoticity in extinction runs and mean time for extinction. We attribute differences 
in extinction runs and and mean time for extinction across rows (equal A to random 
variance and as an effect of the cut-off time, as they should theoretically be equal. 
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A 


K A 
1 3 


Cumulative incidence 1 

3 


718830 140530 
184240 44806 


Prevalence 1 

3 


212600 35263 
46341 9828 


Mean incidence in municipalitites 1 

3 


736 122 
160 34 


Mean time for extinction (days) 1 

3 


4.4 3.9 

3.5 3.3 


Number of afflicted municipalities 1 

3 


279.3 250.7 
249.0 190.6 


Number of extinction runs 1 

3 


95 99 
241 295 


Fraction infected from 1 
another municipality (%) 3 


71 72 
33 33 



Table II: The results for the full simulations over all municipalities. The behaviour 
exhibited in the single municipality simulations is even more apparent here which 
means that retransmission from connected municipalities amplifies the distribution 
effects. 
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is higher. 

Although this may not be immediately apparent, longer times also 
means more individuals in the different stages. Figures [2] and [5] exemplify 
this. Here we've simulated people walking through a tunnel. As people 
emerge from the tunnel we count the number still inside. Everybody walks 
at different speeds. The time it takes for them to get to the other side is 
random, on average 5 days, but in one case (red curve) the time is taken 
from a gamma distribution and in the other, blue curve, an exponential 
distribution. In the first graph, people enter at a constant rate of 1000 per 
day. As can easily be visualized, after a while a steady state is reached 
where both distributions give rise to the same number of people inside the 
tunnel. Afterall, the average time is 5 days in both cases - in the steady 
state, as many should exit as enter the cave, 1000 per day, regardless of 
the distributions. What is more interessting is before the steady state is 
reached. Here the high number of speeders in the exponential case clearly 
make their mark in the statistics, quickly exiting the cave and leaving a 
fewer number left inside. Only after the steady state has been are the 
slow-walkers inside sufficiently numbered to make up for the speeders. 

Since we are dealing with stochastic simulations, the events are ran- 
dom. The crucial period is the initial phase of the simulated epidemic 
which is decisive for the future evolution of the epidemic, both speed and 
proportions. As there are very few infecteds the intitial phase of the out- 
break proceeds in a highly random fashion. After the initial phase the 
process smooths out and becomes more predictable and familiar. When 
considering the effects of changing the distributions it is important to 
consider effects which befalls the initial phase but are evened out as more 
people become infected. 

The first graph illustrate the impact of the gamma versus the expo- 
nential distribution but respresents an endemic scenario. The case of an 
outbreak is different as rate of people becoming infectious is not constant, 
but rather grows exponentially. In the second graph, people enter not 
at a constant rate, but at an exponentially increasing rate such that the 
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Form parameter (k=X) 



Figure 4: Here the form parameter for both latant and time distributions are set 
equal. Cumulative incidence i.e. the total number of infected, is plotted below for 
each setting. 
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Figure 5: Analogous plot as figure[2]but with exponentially increasing entrance rate, 
more suited to portraying epidemic growth. As long as the rate increases this way, 
there will never be a steady state where the number of people in the cave for the two 
cases are equal. 
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rate of entrance every week is ten times what it was the previous week 
and the tunnel will ever be more and more packed with people. As long 
as the groth rate does not wane, a steady state is never reached. The 
slow-walkers will never compensate for the speeders and the number of 
people in the gamma-tunnel climbs faster than in the exponential tunnel. 

In a multi-municipal model the dynamics are more complex and our 
simplistic cave-model does not offer any enlightenment. The basic be- 
haviour, though, is expected to follow along the same lines as in the single 
municipality case and the arguments are similar, but to what extent to 
is not immediately certain. Intuition tells us that the combined effect of 
two contributions is more than the sum. We may therefore expect a high 
incidence in when the infectious period is prolonged due to the combined 
contribution of more numerous infectious and the amount of traveling they 
have time with during their infectious period. As it turns out, the results 
of our simulations agrees with preliminary guesses. 

We should mention that the gamma-distribution is perhaps not the 
only choice of modelers. Many alternatives have been proposed such as 
the Log-normal distribution and Weibull distributions. All three have sim- 
ilar plots but differ some in key points also as regards to the behaviour 
of the tails. As we have illustrated the tails of the assumed distribution 
is important for the outcome of the simulations. The effect of these dif- 
ferences for epidemic models have not been studied to our knowledge. 
None of these, however, would be compatible with our modeling approach 
which uses stages. In that respect, our choice is as much a consequence 
of design as deliberate choice, as is the exponential distribution to other 
model. The significant improvement of the model shown in this paper, 
while retaining the Markov model. Possible benefits of alternative choices 
of distributions will be for future experiments to show. 
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